99 research outputs found

    Algorithms for RNA secondary structure analysis : prediction of pseudoknots and the consensus shapes approach

    Get PDF
    Reeder J. Algorithms for RNA secondary structure analysis : prediction of pseudoknots and the consensus shapes approach. Bielefeld (Germany): Bielefeld University; 2007.Our understanding of the role of RNA has undergone a major change in the last decade. Once believed to be only a mere carrier of information and structural component of the ribosomal machinery in the advent of the genomic age, it is now clear that RNAs play a much more active role. RNAs can act as regulators and can have catalytic activity - roles previously only attributed to proteins. There is still much speculation in the scientific community as to what extent RNAs are responsible for the complexity in higher organisms which can hardly be explained with only proteins as regulators. In order to investigate the roles of RNA, it is therefore necessary to search for new classes of RNA. For those and already known classes, analyses of their presence in different species of the tree of life will provide further insight about the evolution of biomolecules and especially RNAs. Since RNA function often follows its structure, the need for computer programs for RNA structure prediction is an immanent part of this procedure. The secondary structure of RNA - the level of base pairing - strongly determines the tertiary structure. As the latter is computationally intractable and experimentally expensive to obtain, secondary structure analysis has become an accepted substitute. In this thesis, I present two new algorithms (and a few variations thereof) for the prediction of RNA secondary structures. The first algorithm addresses the problem of predicting a secondary structure from a single sequence including RNA pseudoknots. Pseudoknots have been shown to be functionally relevant in many RNA mediated processes. However, pseudoknots are excluded from considerations by state-of-the-art RNA folding programs for reasons of computational complexity. While folding a sequence of length n into unknotted structures requires O(n^3) time and O(n^2) space, finding the best structure including arbitrary pseudoknots has been proven to be NP-complete. Nevertheless, I demonstrate in this work that certain types of pseudoknots can be included in the folding process with only a moderate increase of computational cost. In analogy to protein coding RNA, where a conserved encoded protein hints at a similar metabolic function, structural conservation in RNA may give clues to RNA function and to finding of RNA genes. However, structure conservation is more complex to deal with computationally than sequence conservation. The method considered to be at least conceptually the ideal approach in this situation is the Sankoff algorithm. It simultaneously aligns two sequences and predicts a common secondary structure. Unfortunately, it is computationally rather expensive - O(n^6) time and O(n^4) space for two sequences, and for more than two sequences it becomes exponential in the number of sequences! Therefore, several heuristic implementations emerged in the last decade trying to make the Sankoff approach practical by introducing pragmatic restrictions on the search space. In this thesis, I propose to redefine the consensus structure prediction problem in a way that does not imply a multiple sequence alignment step. For a family of RNA sequences, my method explicitly and independently enumerates the near-optimal abstract shape space and predicts an abstract shape as the consensus for all sequences. For each sequence, it delivers the thermodynamically best structure which has this shape. The technique of abstract shapes analysis is employed here for a synoptic view of the suboptimal folding space. As the shape space is much smaller than the structure space, and identification of common shapes can be done in linear time (in the number of shapes considered), the method is essentially linear in the number of sequences. Evaluations show that the new method compares favorably with available alternatives

    Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

    Get PDF
    BACKGROUND: The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n(6))time and O(n(4)) space algorithm by Rivas and Eddy is currently the best available program. RESULTS: We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n(4)) time and O(n(2)) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm. CONCLUSIONS: RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm

    pknotsRG: RNA pseudoknot folding including near-optimal structures and sliding windows

    Get PDF
    RNA pseudoknots are an important structural feature of RNAs, but often neglected in computer predictions for reasons of efficiency. Here, we present the pknotsRG Web Server for single sequence RNA secondary structure prediction including pseudoknots. pknotsRG employs the newest Turner energy rules for finding the structure of minimal free energy. The algorithm has been improved in several ways recently. First, it has been reimplemented in the C programming language, resulting in a 60-fold increase in speed. Second, all suboptimal foldings up to a user-defined threshold can be enumerated. For large scale analysis, a fast sliding window mode is available. Further improvements of the Web Server are a new output visualization using the PseudoViewer Web Service or RNAmovies for a movie like animation of several suboptimal foldings

    Shape based indexing for faster search of RNA family databases

    Get PDF
    Janssen S, Reeder J, Giegerich R. Shape based indexing for faster search of RNA family databases. BMC Bioinformatics. 2008;9(1):131.Background: Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family members using covariance models (CMs) is very time consuming. Filtering approaches that use the sequence conservation to reduce the number of CM searches, are fast, but it is unknown to which sacrifice. Results: We present a new filtering approach, which exploits the family specific secondary structure and significantly reduces the number of CM searches. The filter eliminates approximately 85% of the queries and discards only 2.6% true positives when evaluating Rfam against itself. First results also capture previously undetected non-coding RNAs in a recent human RNAz screen. Conclusion: The RNA shape index filter (RNAsifter) is based on the following rationale: An RNA family is characterised by structure, much more succinctly than by sequence content. Structures of individual family members, which naturally have different length and sequence composition, may exhibit structural variation in detail, but overall, they have a common shape in a more abstract sense. Given a fixed release of the Rfam data base, we can compute these abstract shapes for all families. This is called a shape index. If a query sequence belongs to a certain family, it must be able to fold into the family shape with reasonable free energy. Therefore, rather than matching the query against all families in the data base, we can first (and quickly) compute its feasible shape(s), and use the shape index to access only those families where a good match is possible due to a common shape with the query

    Extrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknown

    Get PDF
    The availability of high-throughput parallel methods for sequencing microbial communities is increasing our knowledge of the microbial world at an unprecedented rate. Though most attention has focused on determining lower-bounds on the alpha-diversity i.e. the total number of different species present in the environment, tight bounds on this quantity may be highly uncertain because a small fraction of the environment could be composed of a vast number of different species. To better assess what remains unknown, we propose instead to predict the fraction of the environment that belongs to unsampled classes. Modeling samples as draws with replacement of colored balls from an urn with an unknown composition, and under the sole assumption that there are still undiscovered species, we show that conditionally unbiased predictors and exact prediction intervals (of constant length in logarithmic scale) are possible for the fraction of the environment that belongs to unsampled classes. Our predictions are based on a Poissonization argument, which we have implemented in what we call the Embedding algorithm. In fixed i.e. non-randomized sample sizes, the algorithm leads to very accurate predictions on a sub-sample of the original sample. We quantify the effect of fixed sample sizes on our prediction intervals and test our methods and others found in the literature against simulated environments, which we devise taking into account datasets from a human-gut and -hand microbiota. Our methodology applies to any dataset that can be conceptualized as a sample with replacement from an urn. In particular, it could be applied, for example, to quantify the proportion of all the unseen solutions to a binding site problem in a random RNA pool, or to reassess the surveillance of a certain terrorist group, predicting the conditional probability that it deploys a new tactic in a next attack.Comment: 14 pages, 7 figures, 4 table

    KnotInFrame: prediction of −1 ribosomal frameshift events

    Get PDF
    Programmed −1 ribosomal frameshift (−1 PRF) allows for alternative reading frames within one mRNA. First found in several viruses, it is now believed to exist in all kingdoms of life. Strong stimulators for −1 PRF are a heptameric slippery site and an RNA pseudoknot. Here, we present a new algorithm KnotInFrame, for the automatic detection of −1 PRF signals from genomic sequences. It finds the frameshifting stimulators by means of a specialized RNA-pseudoknot folding program, fast enough for genome-wide analyses. Evaluations on known −1 PRF signals demonstrate a high sensitivity

    Defining seasonal marine microbial community dynamics

    Get PDF
    Here we describe, the longest microbial time-series analyzed to date using high-resolution 16S rRNA tag pyrosequencing of samples taken monthly over 6 years at a temperate marine coastal site off Plymouth, UK. Data treatment effected the estimation of community richness over a 6-year period, whereby 8794 operational taxonomic units (OTUs) were identified using single-linkage preclustering and 21 130 OTUs were identified by denoising the data. The Alphaproteobacteria were the most abundant Class, and the most frequently recorded OTUs were members of the Rickettsiales (SAR 11) and Rhodobacteriales. This near-surface ocean bacterial community showed strong repeatable seasonal patterns, which were defined by winter peaks in diversity across all years. Environmental variables explained far more variation in seasonally predictable bacteria than did data on protists or metazoan biomass. Change in day length alone explains >65% of the variance in community diversity. The results suggested that seasonal changes in environmental variables are more important than trophic interactions. Interestingly, microbial association network analysis showed that correlations in abundance were stronger within bacterial taxa rather than between bacteria and eukaryotes, or between bacteria and environmental variables

    Three classes of hemoglobins are required for optimal vegetative and reproductive growth of Lotus japonicus: genetic and biochemical characterization of LjGlb2-1

    Get PDF
    Legumes express two major types of hemoglobins, namely symbiotic (leghemoglobins) and non-symbiotic (phytoglobins), with the latter being categorized into three classes according to phylogeny and biochemistry. Using knockout mutants, we show that all three phytoglobin classes are required for optimal vegetative and reproductive development of Lotus japonicus. The mutants of two class 1 phytoglobins showed different phenotypes: Ljglb1-1 plants were smaller and had relatively more pods, whereas Ljglb1-2 plants had no distinctive vegetative phenotype and produced relatively fewer pods. Non-nodulated plants lacking LjGlb2-1 showed delayed growth and alterations in the leaf metabolome linked to amino acid processing, fermentative and respiratory pathways, and hormonal balance. The leaves of mutant plants accumulated salicylic acid and contained relatively less methyl jasmonic acid, suggesting crosstalk between LjGlb2-1 and the signaling pathways of both hormones. Based on the expression of LjGlb2-1 in leaves, the alterations of flowering and fruiting of nodulated Ljglb2-1 plants, the developmental and biochemical phenotypes of the mutant fed on ammonium nitrate, and the heme coordination and reactivity of the protein toward nitric oxide, we conclude that LjGlb2-1 is not a leghemoglobin but an unusual class 2 phytoglobin. For comparison, we have also characterized a close relative of LjGlb2-1 in Medicago truncatula, MtLb3, and conclude that this is an atypical leghemoglobin
    corecore